AITopics | verifiable reinforcement learning

Verifiable Reinforcement Learning via Policy Extraction

Neural Information Processing SystemsNov-21-2025, 14:11:14 GMT

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.

decision tree policy, name change, verifiable reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Reviews: Verifiable Reinforcement Learning via Policy Extraction

Neural Information Processing SystemsOct-8-2024, 08:11:37 GMT

Post rebuttal Thank the authors for the clarification. One minor point I realised is the equation between line 144 and 145. Is this constraint really a disjunction over partitions? If there is at least one partition the given state doesn't belong to, it would be always true because at least one of inner propositions will be true, wouldn't it? The trained decision tree policy allows for its verification in terms of, more specifically, correctness, stability and robustness.

decision tree policy, policy extraction, verifiable reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Verifiable Reinforcement Learning via Policy Extraction

Bastani, Osbert, Pu, Yewen, Solar-Lezama, Armando

Neural Information Processing SystemsFeb-14-2020, 10:56:27 GMT

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole.

decision tree policy, policy extraction, verifiable reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Verifiable Reinforcement Learning via Policy Extraction

Bastani, Osbert, Pu, Yewen, Solar-Lezama, Armando

arXiv.org Machine LearningMay-21-2018

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.

decision tree policy, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1805.08328

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Transportation (0.47)

Technology: